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Abstract.  Three  variants  of  multi-threaded  IC3  are  presented.  Each 
variant  has  a  fixed  number  of  ic3s  running  in  parallel,  and  communi¬ 
cating  by  sharing  lemmas.  They  differ  in  the  degree  of  synchronization 
between  threads,  and  the  aggressiveness  with  which  proofs  are  checked. 
The  correctness  of  all  three  variants  is  shown.  The  variants  have  unpre¬ 
dictable  runtime.  On  the  same  input,  the  time  to  find  the  solution  over 
different  runs  varies  randomly  depending  on  the  thread  interleaving.  The 
use  of  a  portfolio  of  solvers  to  maximize  the  likelihood  of  a  quick  solution 
is  investigated.  Using  the  Extreme  Value  theorem,  the  runtime  of  each 
variant,  as  well  as  their  portfolios  is  analysed  statistically.  A  formula 
for  the  portfolio  size  needed  to  to  achieve  a  verification  time  with  high 
probability  is  derived,  and  validated  empirically.  Using  a  portfolio  of  20 
parallel  ic3s,  speedups  over  300  are  observed  compared  to  the  sequential 
IC3  when  on  hardware  model  checking  competition  examples. 


1  Introduction 

In  recent  years,  IC3  [5]  has  emerged  as  a  leading  algorithm  for  model  checking 
hardware.  It  has  been  refined  [8]  and  incorporated  into  state-of-the-art  tools, 
and  generalized  to  verify  software  [10,6].  Our  interest  is  that  IC3  is  amenable 
to  parallelization  [5],  and  promises  new  approaches  to  enhance  the  capability  of 
model  checking  by  harnessing  the  abundant  computing  power  available  today. 
Indeed,  the  original  IC3  paper  [5]  described  a  parallel  version  of  IC3  informally 
and  reported  on  its  positive  performance.  In  this  paper,  we  build  on  that  work 
and  make  three  contributions. 

First,  we  formally  present  three  variants  -  IC3SYNC,  IC3ASYNC  and  IC3PROOF 
of  parallel  IC3,  and  prove  their  correctness.  All  the  variants  have  some  common 
features:  (i)  they  consist  of  a  fixed  number  of  threads  that  execute  in  parallel; 
(ii)  each  thread  learns  new  lemmas  and  looks  for  counterexamples  (CEXes)  or 
proofs  as  in  the  original  IC3;  (iii)  all  lemmas  learned  by  a  thread  are  shared  with 
the  other  threads  to  limit  duplicate  work;  and  (iv)  if  any  thread  finds  a  CEX, 
the  overall  algorithm  declares  the  problem  unsafe  and  terminates. 

However,  the  variants  differ  in  the  degree  of  inter-thread  synchronization, 
and  the  frequency  and  technique  for  detecting  proofs,  making  different  trade-offs 
between  the  overhead  and  likelihood  of  proof-detection.  Threads  in  IC3SYNC  (cf. 
Sec.  3.1)  synchronize  after  each  round  of  new  lemma  generation  and  propagation, 
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and  check  for  proofs  in  a  centralized  manner.  Threads  in  IC3ASYNC  (cf.  Sec.  3.2) 
are  completely  asynchronous.  Proof-detection  is  decentralized  and  done  by  each 
thread  periodically.  Finally,  threads  in  IC3PROOF  are  also  asynchronous  and 
perform  their  own  proof  detection,  but  more  aggressively  than  ic3async.  Each 
thread  saves  the  most  recent  set  of  inductive  lemmas  constructed.  When  a  thread 
finds  a  new  set  of  inductive  lemmas,  it  checks  if  the  collection  of  inductive  lemmas 
across  all  threads  form  an  inductive  invariant.  In  order  of  increasing  overhead 
(and  likelihood)  of  proof-detection,  the  variants  are:  IC3SYNC,  ic3async,  and 
IC3PROOF.  Collectively,  we  refer  to  the  variants  as  IC3PAR. 

The  runtime  of  IC3PAR  is  unpredictable  (this  is  a  known  phenomenon  [5]). 
In  essence,  the  number  of  steps  to  arrive  at  a  proof  (or  CEX)  varies  with  the 
thread  interleaving.  We  propose  to  counteract  this  variance  using  a  portfolio  - 
run  several  ic3pars  in  parallel,  and  stop  as  soon  as  any  one  terminates  with  an 
answer.  But  how  large  should  such  a  portfolio  be?  Our  second  contribution  is 
a  statistical  analysis  to  answer  this  question.  Our  insight  is  that  the  runtime  of 
IC3PAR  should  follow  the  Weibull  distribution  [18]  closely.  This  is  because  it  can 
be  thought  of  as  the  minimum  of  the  runtimes  of  the  threads  in  IC3PAR,  which 
are  themselves  independent  and  identically  distributed  (i.i.d.)  random  variables. 
According  to  the  Extreme  Value  theorem  [9],  the  minimum  of  i.i.d.  variables 
converges  to  a  Weibull.  We  empirically  demonstrate  the  validity  of  this  claim. 

Next,  we  hoist  the  same  idea  to  a  portfolio  of  ic3pars.  Again,  the  runtime  of 
the  portfolio  should  be  approximated  well  by  a  Weibull,  since  it  is  the  minimum 
of  the  runtime  of  each  ic3par  in  the  portfolio.  Under  this  assumption,  we  derive 
a  formula  (cf.  Theorem  5)  to  compute  the  portfolio  size  sufficient  to  solve  any 
problem  with  a  specific  probability  and  speedup  compared  to  a  single  ic3par. 
For  example,  this  formula  implies  that  a  portfolio  of  20  IC3pars  has  0.99999 
probability  of  solving  a  problem  in  time  no  more  than  the  “expected  time”  for  a 
single  ic3par  to  solve  it.  We  empirically  show  (cf.  Sec.  5.2)  that  the  predictions 
based  on  this  formula  have  high  accuracy.  Note  that  each  solver  in  the  portfolio 
potentially  searches  for  a  different  proof/CEX.  The  first  one  to  succeed  provides 
the  solution.  In  this  way,  a  portfolio  utilizes  the  power  of  IC3  to  search  for  a 
wide  range  of  proofs/CEXes  without  sacrificing  performance. 

Finally,  we  implement  all  three  ic3par  variants,  and  evaluate  them  on  bench¬ 
marks  from  the  2014  Hardware  Model  Checking  Competition  (HMCC14)  and 
“TIP” .  Using  each  variant  individually,  and  in  portfolios  of  size  20,  we  observe 
that  IC3PROOF  and  ic3async  outperform  IC3SYNC.  Moreover,  compared  to  a 
purely  sequential  IC3,  the  variants  are  faster,  providing  an  average  speedup  of 
over  6  and  a  maximum  speedup  of  over  300.  We  also  show  that  widening  the 
proof  search  of  IC3  by  randomizing  its  SAT  solver  is  not  as  effective  as  paral¬ 
lelization.  Complete  details  are  presented  in  Section  5.1. 

Related  Work.  The  original  IC3  paper  [5]  presents  a  parallel  version  infor¬ 
mally,  and  shows  empirically  that  parallelism  can  improve  verification  time.  Our 
IC3PAR  solvers  were  inspired  by  this  work,  but  are  different.  For  example,  the 
parallel  IC3  in  [5]  implements  clause  propagation  by  first  distributing  learned 
clauses  over  all  solvers  and  then  propagating  them  one  frame  at  a  time,  in  lock 


step.  It  also  introduces  uncertainty  in  the  proof  search  by  randomizing  the  back¬ 
end  SAT  solver.  Our  IC3PAR  solvers  perform  clause  propagation  asynchronously, 
and  use  deterministic  SAT  solvers.  We  also  present  each  ic3par  variant  formally 
with  pseudo-code  and  prove  their  correctness.  Finally,  we  perform  a  statistical 
analysis  of  the  runtimes  of  both  ic3par  solvers  and  their  portfolios.  Experi¬ 
mental  results  (cf.  Sec.  5.1)  indicate  that  a  portfolio  of  IC3par  solvers  is  more 
efficient  than  a  portfolio  composed  of  IC3  solvers  with  randomized  SAT  solvers. 

A  number  of  projects  focus  on  parallelizing  model  checking  [11,4,15,2,3, 
1].  Ditter  et  al.  [7]  have  developed  GPGPU  algorithms  for  explicit-state  model 
checking.  They  do  not  report  on  variance  in  runtime,  nor  analyse  it  statistically 
like  us,  or  explore  the  use  of  portfolios.  Lopes  et  al.  [13]  do  address  variance 
in  runtime  of  a  parallel  software  model  checker.  However,  their  approach  is  to 
make  the  model  checker’s  runtime  more  predictable  by  ensuring  that  the  coun¬ 
terexample  generation  procedure  is  deterministic.  They  also  do  not  perform  any 
statistical  analysis  or  explore  portfolios. 

Portfolios  have  been  use  successfully  in  SAT  solving  [20,17,12,14],  SMT 
solving  [19]  and  symbolic  execution  [16].  However,  these  portfolios  are  composed 
of  a  heterogeneous  set  of  solvers.  Our  focus  is  on  homogeneous  portfolios  of 
IC3PAR  solvers  and  statistical  analysis  of  their  runtimes. 


2  Preliminaries 


Assume  Boolean  state  variables  V,  and  their  primed  versions  V' .  A  verification 
problem  is  (I,T,S)  where  I(V),  T{V,V')  and  S(V)  denote  initial  states,  tran¬ 
sition  relation  and  safe  states,  respectively.  We  omit  V  when  it  is  clear  from 
the  context,  and  write  S'  to  mean  S(V').  Let  Post(X)  denote  the  image  of  X 
under  the  transition  relation  T.  Let  Postk(X)  be  the  result  of  applying  Post(-)  k 
times  on  X  with  Post°(X )  =  X,  and  Postk+  (X)  =  (J  PosF(X).  The  problem 

j>k 

(I,T,S)  is  safe  if  Post0+(I)  C  S.  and  unsafe  (a.k.a.  buggy)  otherwise. 

A  random  variable  X  has  a  Weibull  distribution  with  shape  k  and  scale 
A,  denoted  X  ~  WEi(fc,  A),  iff  its  probability  density  function  (pdf)  fx  and 
cumulative  distribution  function  (cdf)  Fx  are  defined  as  follows: 


fx{x) 


y(f)fc_1e“(f)fc  if  re  >  0 
0  if  re  <  0 


Fx  ( x )  =  1  —  e  ^  ^ 


Let  Xi, ... ,  Xn  be  i.i.d.  random  variables  (rvs)  whose  pdfs  are  lower  bounded 
at  zero,  i.e.,  \/x  <  0. /x,  (x)  =  0.  Then,  by  the  Extreme  Value  theorem  [9]  (EVT), 
the  pdf  of  the  rv  X  =  min(ATi, . . . ,  Xn)  converges  to  a  Weibull  as  n  A  oo. 


3  Parallelizing  IC3 

We  begin  with  a  description  of  the  sequential  IC3  algorithm.  Fig.  1  shows  its 
pseudo-code.  IC3  works  as  follows:  (i)  checks  that  no  state  in  -<S  is  reachable 


1  //--  global  variables 

2  var  (/,  T,  S)  :  problem  (P) 

3  var  F:  frame  []  (array  of  frames) 

4  var  K:  int  (size  of  F) 

5  var  bug:  bool  (CEX  flag) 

6 

7  //--  invariants 

8  Vi  G  [0,  K  —  1] ,  let  f(i )  =  /\  A  a 

ie[i,K-l]aGF[i] 

9  Ap  :  Vi  G  [0,  K  —  1]  .  I  =)>  f(i ) 

10  A2  :  Vi  G  [0,  K  -  2]  .  f(i )  AT  f'(i  +  1) 

11  A3  :  Vi  G  [0,  K  -  3]  .  /(i)  A  T  =>  S' 

12  A4  :Vi  G  [0,  K  -  2]  .  f(i)  A  T  =>  S' 

13 

14  //--  main  function. 

15  bool  IC3  () 

16  if  (I  A  ->S  ^  1)  V  (/  A  T  A  ~iS'  1) 

17  return  _L ; 

18  K  :=  3;  F[0]  :=  /;  F[l]  :=  0; 

19  F[2]  :=  0;  bug  :=  _L  ; 

20  while  (T) 

21  @INV{Xi  :  Ai  A  A2  A  A3> 

22  st r engthen  (F.  K)  ; 

23  @INV{Z2  :  bug  V  (Ai  A  A2  A  A4)} 

24  if  (bug)  return  _L ; 

25  @INV{Z3  :  Ai  A  A2  A  A4> 

26  propagate  (F,  K)  ; 

27  if  (3i  G  [1,  K  —  2]  .  F[i]  =  0) 

28  return  T ; 

29  @INV{X3> 

30  F[K]  :=  0;  K  :=  K  +  1 ; 


31  //--  add  new  lemmas  to  frames,  stop 

32  //--  with  a  CEX  or  when  A4  holds . 

33  void  strengthen  (P,  K) 

34  var  PQ  :  priority  queue 

35  while  (T) 

36  if  ( f(K  —  2)  A  T  =4*  S')  return  ; 

37  let  m  |=  f(K  —  2)  A  T  A  —>S' ; 

38  PQ .insert(m,  K  —  3) ; 

39  while  (~>PQ. empty ()) 

40  (m,  /)  :=  PQ.top () ; 

41  if  (/(/)  A  T  A  m/  =  ±) 

42  P[/ +  1]  :=  P[Z  +  1]  U  {^m}; 

43  PQ .erase(m,  l) ; 

44  else  if  (Z  =  0) 

45  bug  :=  T  ;  return  ; 

46  else 

47  let  m'  |=  /(Z)  A  T  A  m ; 

48  PQ  .insert  (m' ,  l  —  1); 

49 

50  //--  push  inductive  clauses  forward. 

51  //--  check  for  proof  of  safety. 

52  void  propagate  (P,  PT) 

53  for  i  :  1  .  .  .  K  —  2 

54  for  a  G  P[«] 

55  if  (f(i)  AT  =>  a') 

56  F[i  +  1]  :=  P[i  +  1]  U  {a}  ; 

57  F[i]  :=  F[i]  \  {a}  ; 


Fig.  1.  Pseduo-Code  for  IC3.  Variables  are  passed  by  reference. 


in  0  or  1  steps  from  some  state  in  /  (lines  16-17);  (ii)  iteratively  construct  an 
array  of  frames,  each  consisting  of  a  set  of  clauses,  as  follows:  (a)  initialize  the 
frame  array  and  flags  (lines  18-19);  (b)  strengthen  ()  the  frames  by  adding 
new  clauses  (line  22);  if  a  counterexample  is  found  in  this  step  (indicated  by  bug 
being  set),  IC3  terminates  (line  24);  (c)  otherwise,  propagate  ()  clauses  that  are 
inductive  to  the  next  frame  (line  26);  if  a  proof  of  safety  is  found  (indicated  by 
an  empty  frame),  IC3  again  terminates  (lines  27-28);  (d)  add  a  new  empty  frame 
to  the  end  of  the  array  (line  30)  and  repeat  from  step  (b). 

Definition  1  (Frame  Monotonicity).  A  function  is  frame  monotonic  if  at 
each  point  during  its  execution,  Vi  €  [0,  K  —  1] .  /(i)  =>  /(i)  where  /(i)  is  the 
value  of  f{i)  when  the  function  was  called. 

Correctness.  Fig.  1  also  shows  the  invariants  (indicated  by  @INV)  before  and 
after  strengthenO  and  propagate ().  Since  strengthenO  always  adds  new 
lemmas  to  frames,  it  is  frame  monotonic,  and  hence  it  maintains  A\  and  A3.  It 
also  maintains  A2  since  a  new  lemma  —>m  is  added  to  frame  F[l  +  1]  (line  42) 
only  if  f(l)  A  T  =>  -1  m!  (line  41).  Finally,  when  strengthenO  returns,  then 
either  bug  =  T  (line  45),  or  f{K  -2)  AT  =>  S'  (line  36).  Hence  X2  is  a  valid 
post-condition  for  strengthenO.  Also,  propagateO  is  frame  monotonic  since 
it  always  pushes  inductive  lemmas  forward  (the  order  of  the  two  statements  at 
lines  56-57  is  crucial  for  this).  Hence,  propagateO  maintains  A\  and  A4  at  all 


58  //--  global  variables 

59  var  (I,  T,  S)  :  problem  (P) 

60  var  Vi  6  [1,  n]  ■  :  frame  [] 

61  var  K:  int  (size  of  each  F^) 

62  var  bug:  bool  (CEX  flag) 

63 

64  //--  invariants 

65  Wj  e  [0,  K  —  1]  ,  let 

66  /O')  =  A  A  A  « 

«€[l,n]  kgp.K-l]  agFJfc] 

67 

68  B±  :  Vj  £  [0,  K  -  1]  .  /  =*  /(/) 

69  B2  :  Vj  e  [0,  K  -  2]  .  /O')  AT  =►  /'O'  +  1) 

70  S3  :  Vj  6  [0,  K  -  3]  .  f(j)  AT  ^  S' 

71  Bi  :  Vj  6  [0,  K  -  2]  .  /O)  AT  ^  S' 


72  bool  IC3Sync  (n) 

73  if  (/An5^1)v(/ATAn5Vl) 

74  return  _L ; 

75  K  :=  3;  bug  _L  ; 

76  Vi  €  [l,n]  .Fi[0]  := /;  Fj[l]  :=  Fi[2]  :=  0 ; 

77  while  (T) 

78  @INV{Z4  :  B\  A  P2  A  P3> 

79  {strengthen(Fi.K)  ;propagate(Fi,K)} 

80  ||  •  •  •  || 

81  {strengthen  (Fn ,K)  ;  propagate  (Fn ,  K )  } 

82  @INV{Z5  :  bug  V  (Pi  A  B2  A  P4)} 

83  if  (bug)  return  _L ; 

84  @INV{I6:BiAB2AB4} 

85  if  (3j  e  [1,K  -  2]  .  Vi  G  [l,n]  .  Fi[j]  =  0) 

86  return  T ; 

87  @INV{Z6> 

88  Vi  G  [l,n]  .F/[K]  :=  0;  K:=K  +  1; 


Fig.  2.  Pseduo-Code  for  IC3SYNC.  Variables  are  passed  by  reference.  Functions 
strengthen ()  and  propagate  ()  are  defined  in  Fig.  1. 


times.  It  also  maintains  Ai  since  a  new  lemma  a  is  added  to  frame  F[i  +  1]  (line 
56)  only  if  /(*)  A  T  =4>  a’  (line  55).  Hence  Z3  is  a  valid  post-condition  for 
propagate ().  Finally,  note  that  A4  =  A3  A  /[ K  —  2]  =>  S.  Hence,  after  K 
is  incremented,  A4  becomes  A3.  Also,  since  the  last  frame  is  initialized  to  0,  A\ 
and  A2  are  preserved.  Hence:  {J3}F[K]  :=  0;  K  :=  K  +  1;  {X\ }.  The  correctness 
of  ic3  is  summarized  by  Theorem  1.  Its  proof  is  in  Appendix  A. 

Theorem  1.  If  IC3()  returns  T,  then  the  problem  is  safe.  If  IC3()  returns  _L, 
then  the  problem  is  unsafe. 

We  now  present  the  three  versions  of  parallel  IC3  and  their  correctness  (their 
termination  follows  in  the  same  way  as  IC3  [5]  -  see  Theorem  5  in  Appendix  A). 


3.1  Synchronous  Parallel  IC3 

The  first  parallelized  version  of  IC3,  denoted  IC3SYNC,  runs  a  number  of  copies  of 
the  sequential  IC3  “synchronously”  in  parallel.  Let  ic3SYNC(n)  be  the  instance 
of  IC3SYNC  consisting  of  n  copies  of  IC3  executing  concurrently.  The  copies 
maintain  separate  frames.  However,  for  any  copy,  the  frames  of  other  copies 
act  as  “background  lemmas”.  Specifically,  the  copies  interact  by:  (i)  using  the 
frames  of  all  other  copies  when  computing  /(f);  (ii)  declaring  the  problem  unsafe 
if  any  copy  finds  a  counterexample;  (iii)  declaring  the  problem  safe  if  some  frame 
becomes  empty  across  all  the  copies;  and  (iv)  “synchronizing”  after  each  call  to 
strengthen!)  and  propagate ()  . 

The  pseudo-code  for  ic3SYNC(n)  is  shown  in  Fig.  2.  The  main  function  is 
IC3Sync().  After  checking  the  base  cases  (lines  73-74),  it  initializes  flags  and 
frames  (lines  75-76),  and  then  iteratively  performs  the  following  steps:  (i)  run 
n  copies  IC3  where  each  copy  does  a  single  step  of  strengthen!)  followed  by 
propagate!)  (lines  79-81);  (ii)  check  if  any  copy  of  IC3  found  a  counterexample, 
and  if  so,  terminate  (line  83);  (iii)  check  if  a  proof  of  safety  has  been  found,  and  if 


89  //--  invariants 

90  Vjf  G  [0,  max(Ki ,  ....  K^)  —  1]  ,  let 

9!  fU)  =  A  A  A  a 

«€[l,n]  teli.Kj-1]  oeF;[t] 

92 

93  Ci  :  Vj  6  [0,  K,  -  1]  .  /  =►  f(j) 

94  C2  :  Vj  6  [0,  K,  -  2]  .  f(j )  AT  f'{j  +  1) 

95  C3  :  Vi  6  [0,  K,  -  3]  .  /(i)  AT  =>  S' 

96  C4  :  Vi  e  [0,  K,  -  2]  .  /(i)  A  T  ^  S' 

97 

98 

99 

100  //--  top-level  function 

101  bool  IC3Async  (n) 

102  if  (/An5/l)V(/ATAn5Vl) 

103  return  _L ; 

104  bug  :=  ±  ; 

105  IC3Copy  ( 1)  o  •  • • o  IC3Copy(n); 

106  return  bug  ?  _L  :  T ; 


107  //--  global  variables 

108  var  (/,  T,  S')  :  problem  (P) 

109  var  Vi  G  [1,  n]  .  Fj :  frame  [] 

110  var  Vi  G  [1,  n]  .  K*  :  int  (size  of  F* ) 

111  var  bug:  bool  (CEX  flag) 

112 

113  void  IC3Copy  (i) 

114  Ki  :=  3;  F;  [0]  :=  I; 

115  Fi[l]  :=  0;  F» [2]  :=  0; 

116  while  (T) 

117  ®INV{X7  :  Ci  A  C2  A  C3> 

118  strengthen  (Fj,  Ki  )  ; 

119  QINV-CXs  :  bug  V  (Cr  A  C2  A  C4)} 

120  if  (bug)  return; 

121  @INV{Xg  :  Ci  A  C2  A  C4> 

122  propagate  (Fi ,  Ki  )  ; 

123  if  (3y  G  [l,Ki  -  2]  .  Vi  G  [l,n] -Fi[j]  =  0) 

124  return; 

125  @INV{X9> 

126  Fi[Ki]:=0;  Ki:=Ki  +  l; 


Fig.  3.  Pseduo-Code  for  ic3async.  Variables  are  passed  by  reference.  Functions 
strengthen ()  and  propagate  ()  are  defined  in  Fig.  1. 


so,  terminate  (lines  85-86);  and  (iv)  add  a  frame  and  repeat  from  step  (i)  above 
(line  88).  Functions  strengthenO  and  propagate ()  are  syntactically  identical 
to  IC3  (cf.  Fig.  1).  However,  the  key  semantic  difference  is  that  lemmas  from 
all  copies  are  used  to  define  f(j)  (lines  65-66).  Global  variables  are  shared,  and 
accessed  atomically.  Note  that  even  though  all  IC3  copies  write  to  variable  bug, 
there  is  no  race  condition  since  they  always  write  the  same  value  (T). 

Correctness.  The  correctness  of  IC3SYNC  follows  from  the  invariants  specified 
in  Fig.  2.  To  show  these  invariants  are  valid,  the  main  challenge  is  to  show  that 
if  X\  holds  at  line  78,  then  I5  holds  at  line  82.  Note  that  since  strengthenO 
and  propagateO  are  frame  monotonic,  they  preserve  B\  and  B3.  This  means 
that  B\  A  B3  holds  at  line  82.  Now  suppose  that  at  line  82,  we  have  ~^bug.  This 
means  that  each  strengthenO  called  at  lines  79-81  returned  from  line  36.  Thus, 
the  condition  /( K  —  2)  A  T  =>  S'  was  established  at  some  point,  and  once 
established,  it  continues  to  hold  due  to  the  frame  nronotonicity  of  strengthenO 
and  propagate  () .  Since  B4  =  B3  A  (/( K  —  2)  A  T  =>  S'),  we  therefore  know 
that  B 1  A  H4  holds  at  line  82.  Also,  B2  holds  at  line  82  since  a  new  lemma  a  is 
only  added  to  frame  F.t [j  +  1]  by  strengthenO  (line  42)  and  propagateO  (line 
56)  under  the  condition  f{j)  AT  ==>  a'.  Note  that  once /(j)  AT  a' is  true, 
it  continues  to  hold  even  in  the  concurrent  setting  due  to  frame  monotonicity. 
Finally,  the  statement  at  line  88  transforms  Iq  to  I4.  The  correctness  of  IC3SYNC 
is  summarized  by  Theorem  2.  Its  proof  is  in  Appendix  A. 

Theorem  2.  //IC3Sync()  returns  T,  then  the  problem  is  safe.  7/IC3Sync() 
returns  _L,  then  the  problem  is  unsafe. 


3.2  Asynchronous  Parallel  IC3 

The  next  parallelized  version  of  IC3,  denoted  IC3ASYNC,  runs  a  number  of  copies 
of  the  sequential  IC3  “asynchronously”  in  parallel.  Let  !C3ASYNC(n)  be  the  in- 


127 

//-- 

•  global 

variables 

147 

bool 

IC3Proof  (n) 

128 

var 

(/,  T,  S)  : 

problem  (P) 

148 

if 

(/  A  ~>S  ^  1)  V  (/  A  T  A  ->S'  7^ 

129 

var 

Vi  £  [1,  n] 

.  Fi,  Pi  :  frame  [] 

149 

r 

eturn  _L ; 

130 

var 

Vi  £  [1,  n] 

.  K i'.  int  (size  of 

and  Pi) 

150 

bug 

_L  ;  safe  _L  ; 

131 

var 

bug,  safe : 

bool  (CEX  and  proof 

flags ) 

151 

IC3PrCopy(l)  <>•••<>  IC3PrCopy 

132 

152 

ret 

urn  bug  ?  _L  :  T ; 

133 

153 

134 

void 

L  IC3PrCo 

py  (i) 

154 

void 

propProof  (P,  K) 

135 

K 

*  :=  3;  Fi 

[0]  :=/; 

155 

for 

j  :  1  ...  K  —  2 

136 

F* 

:[!]:=  0; 

Fi[2]  :=  0; 

156 

f 

or  at  £  F[j] 

137 

while  (T) 

157 

if  (/O')  A  T  a') 

138 

@INV{X7  : 

Ci  A  C2  A  C3> 

158 

F[j  +  1]  :=  F[j  +  1]  U  {a}; 

139 

strength 

enCFj.KJ  ; 

159 

F(j]  ~  F[j]  \  {a}  ; 

140 

<3INV{Z8  : 

bug  V  (Ci  A  C2  A  Ci)! 

160 

if  (F[i]  =  0) 

141 

if  (bug) 

return ; 

161 

P,  :=  U  F[fc]j 

142 

@INV{Xg  : 

Ci  A  C2  A  C4> 

j  <  k  <  K  -  1 

143 

propProo 

f  (Fi,K<); 

162 

n.=  u  P- 

144 

if  (safe) 

return  ; 

{i|l<i<nAj<Ki} 

145 

8INV{Xg> 

163 

if  (17  A  T  =>■  n') 

146 

Fi[K»]  :  = 

0;  Ki  :=  Ki  +  1; 

164 

safe  :=  T  ;  return  ; 

Fig.  4.  Pseduo-Code  for  IC3PROOF.  Variables  are  passed  by  reference.  Function 
strengthenO  is  defined  in  Fig.  1.  Formulas  f{j),X-j,Xs,  and  Xg  are  defined  in  Fig.  3. 


stance  of  IC3ASYNC  consisting  of  n  copies  of  IC3  executing  concurrently.  Similar 
to  IC3SYNC,  the  copies  maintain  separate  frames,  interact  by  sharing  lemmas 
when  computing  f(i),  and  declare  the  problem  unsafe  if  any  copy  finds  a  coun¬ 
terexample.  However,  due  to  the  lack  of  synchronization,  proof  detection  is  dis¬ 
tributed  over  all  the  copies  instead  of  being  centralized  in  the  main  thread. 

Fig.  3  shows  the  pseudo-code  for  ic3async(?x).  The  main  function  is 
IC3Async().  After  checking  the  base  cases  (lines  102-103),  it  initializes  flags 
(line  104),  lauches  n  copies  of  IC3  in  parallel  (line  105)  and  waits  for  some  copy 
to  terminate  (the  o  operator) ,  and  checks  the  flag  and  returns  with  an  appropri¬ 
ate  result  (line  106).  Function  IC3Copy()  is  similar  to  IC3()  in  Fig.  1.  The  key 
difference  is  that  lemmas  from  all  copies  are  used  to  compute  f(j)  (lines  90-91). 

Correctness.  The  correctness  of  IC3ASYNC  follows  from  the  invariants  spec¬ 
ified  in  Fig.  3.  To  see  why  these  invariants  are  valid,  note  that  C\  and  C3 
are  always  preserved  due  to  frame  monotonicity.  If  strengthenO  returns  with 
bug  =  _L,  then  it  returned  from  line  36,  and  hence  /(K*  —  2)  A  T  =>  S1  was 
true  at  some  point  in  the  past  and  continues  to  hold  due  to  frame  monotonicity. 
Together  with  C3,  this  implies  that  C4  holds  at  line  119.  Also,  C2  holds  at  line 
119  since  a  new  lemma  a  is  only  added  to  frame  F,  [j  +  1]  by  strengthenO  (line 
42)  and  propagate  0  (line  56)  under  the  condition  f(j)  A  T  ==>  a! .  Note  that 
once  f(j)  AT  =>  a'  is  true,  it  continues  to  hold  even  under  concurrency  due  to 
frame  monotonicity.  Hence,  holds  at  line  119.  Since  bug  is  never  set  to  _L,  this 
means  that  I9  holds  at  line  121  even  under  concurrency.  Finally,  the  statement 
at  line  126  transforms  I9  to  Tj.  The  correctness  of  IC3ASYNC  is  summarized  by 
Theorem  3.  Its  proof  is  in  Appendix  A. 

Theorem  3.  If  IC3Async()  returns  T,  then  the  problem  is  safe.  //IC3Async() 
returns  _L,  then  the  problem  is  unsafe. 


3.3  Asynchronous  Parallel  IC3  With  Proof-Checking 


The  final  parallelized  version  of  IC3,  denoted  IC3PROOF,  is  similar  to  IC3ASYNC, 
but  add  more  aggressive  checking  for  proofs.  Let  IC3proof(ti)  be  the  instance 
of  IC3ASYNC  consisting  of  n  copies  of  IC3  executing  concurrently.  Similar  to 
ic3async,  the  copies  maintain  separate  frames,  interact  by  sharing  lemmas  when 
computing  /(i),  and  declare  the  problem  unsafe  if  any  copy  finds  a  counterex¬ 
ample.  However,  whenever  a  copy  finds  an  empty  frame,  it  checks  whether  the 
set  of  lemmas  over  all  the  copies  for  the  frame  forms  an  inductive  invariant. 

The  pseudo-code  for  ic3PROOF(n)  is  shown  in  Fig.  4.  The  main  function  is 
IC3Proof().  After  checking  the  base  cases  (lines  148-149),  it  initializes  flags 
(line  150),  lauches  n  copies  of  IC3  in  parallel  (line  151)  and  waits  for  at  least  one 
copy  to  terminate,  and  checks  the  flag  and  returns  with  an  appropriate  result 
(line  152).  Each  copy  of  IC3  is  similar  to  the  sequential  IC3  in  Fig.  1.  The  key 
difference  is  in  propProof  ()  where,  once  an  empty  frame  is  detected  (line  160), 
we  check  whether  a  proof  has  been  found  by  collecting  the  lemmas  for  the  frame 
(lines  161-162),  and  checking  if  these  lemmas  are  inductive  (line  163). 

Correctness.  The  correctness  of  IC3PROOF  follows  from  the  invariants  (whose 
validity  is  similar  to  those  for  IC3ASYNC)  specified  in  Fig.  4.  It  is  summarized 
by  Theorem  4.  The  proof  of  the  theorem  is  in  Appendix  A. 

Theorem  4.  If  IC3Proof  ()  returns  T,  then  the  problem  is  safe.  If  IC3Proof  () 
returns  _L,  then  the  problem  is  unsafe. 

4  Parallel  IC3  Portfolios 

In  this  section,  we  investigate  the  question  of  how  a  good  portfolio  size  can 
be  selected  if  we  want  to  implement  a  portfolio  of  ic3pars.  We  begin  with  an 
argument  about  the  pdf  of  the  runtime  of  lc3ASYNC(n). 

Conjecture  1.  The  runtime  of  ic3async(?i)  converges  to  a  Weibull  rvasn->  oo. 

Argument:  Recall  that  each  execution  of  lc3ASYNC(n)  consists  of  n  copies  of  IC3 
running  in  parallel,  and  that  ic3ASYNC(n)  stops  as  soon  as  one  copy  finds  an  an¬ 
swer.  We  can  consider  the  runtime  of  each  copy  of  IC3  to  be  a  rv.  Specifically,  let 
X,  be  the  rv  denoting  the  runtime  of  the  *-th  copy  of  ic3  assuming  it  was  allowed 
to  run  till  completion.  Recall  that  the  pdf  of  X,  has  a  lower  bound  of  0,  since  no 
run  of  IC3  can  take  negative  time.  Also  the  set  of  random  variables  (Xi, . . . ,  Xn) 
are  i.i.d.  since  the  copies  of  IC3  only  interact  with  each  other  logically.  Finally, 
let  X  be  the  random  variable  denoting  the  runtime  of  ic3ASYNC(n).  Note  that 
X  =  min(Ai, . . . ,  Xn).  Hence,  by  the  EVT,  X  ~  WEl(fc,  A)  for  large  n.  □ 

A  similar  argument  holds  for  IC3SYNC  and  IC3PROOF,  and  therefore  their 
runtime  should  follow  Weibull  as  well.  In  the  rest  of  this  section,  we  write  ic3par 
to  mean  a  specific  parallel  ic3  variant.  Empirically,  we  find  that  the  runtime  of 
ic3par(ti)  follows  a  Weibull  distribution  closely  for  even  modest  values  of  n. 
Specifically,  we  selected  10  examples  (5  safe  and  5  buggy)  from  HWMCC14,  and 
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Fig.  5.  Fitting  ic3par(4)  runtime  to  Weibull.  First  5  examples  are  safe,  next  5  are 
buggy;  SAFE,  BUG,  ALL  =  average  over  safe,  buggy,  and  all  examples;  /x,  g *  =  pre¬ 
dicted,  observed  mean;  a,  a*  =  predicted,  observed  standard  deviation. 


for  each  example  we:  (i)  executed  ic3async(4)  around  3000  times;  (ii)  measured 
the  runtimes;  (iii)  estimated  the  k  and  A  values  for  the  Weibull  distribution 
that  best  fits  these  values;  and  (iv)  computed  the  observed  mean  and  standard 
deviation  from  the  data,  and  the  predicted  mean  and  standard  deviation  from  the 
k  and  A  estimates.  We  repeated  these  experiments  with  ic3sync  and  ic3proof. 

The  results  are  shown  in  Fig.  5(a).  We  see  that  in  all  cases,  the  observed 
mean  and  standard  deviation  is  quite  close  to  the  predicted  ones,  indicating 
that  the  estimated  Weibull  distribution  is  a  good  fit  for  the  measured  runtimes. 
ic3async  and  IC3PROOF  have  similar  performance,  are  and  slightly  faster  overall 
than  IC3SYNC,  indicating  that  additional  synchronization  is  counter-productive. 
The  estimated  k  and  A  values  vary  widely  over  the  examples,  indicating  their 
diversity.  Note  that  smaller  values  of  A  mean  a  smaller  expected  runtime. 

Determining  Portfolio  Size.  Consider  a  portfolio  of  ic3pars.  In  general,  in¬ 
creasing  the  size  of  the  portfolio  reduces  the  expected  time  to  solve  a  problem. 
However,  there  is  diminishing  returns  to  adding  more  solvers  to  a  portfolio  in 
terms  of  expected  runtime.  We  now  express  this  mathematically,  and  derive  a 
formula  for  computing  a  portfolio  size  to  achieve  an  runtime  with  a  target  proba¬ 
bility.  Consider  a  portfolio  of  m  IC3PAR  solvers  run  on  a  specific  problem.  Let  Yt 
denote  the  runtime  of  the  *-th  ic3par.  From  previous  discussion  we  know  that 
Yi  ~  WEi(fc,  A)  for  some  k  and  A.  Therefore,  the  cdf  of  1)  is:  FYi  ( x )  =  1  —  . 

Let  Y  be  the  rv  denoting  the  runtime  of  the  portfolio.  Thus,  we  have  Y  = 
min(Yi, . . . ,  Ym).  More  importantly,  the  cdf  of  Y  is: 

Fy(x)  =  1  -  (1  -  FYl{x))  x  •  •  •  x  (1  -  FYm{x)) 

=  1  -  (e-(f  >>  =  1  -  e”m(S)fe  =  1  -  e^f#f 

Note  that  this  means  Y  is  also  a  Weibull  rv,  not  just  when  m  — >  oo  (as 
per  the  EVT)  but  for  all  m.  More  specifically,  Y  ~  WEi(fc,  -A-).  Recall  that  if 

m 

m  =  1,  then  the  expected  time  to  solve  the  problem  by  the  portfolio  is  E[Yi\. 


We  refer  to  this  time  as  t* ,  the  expected  solving  time  for  a  single  IC3par.  Since 
Y\  ~  WEl(/c,  A),  it  is  known  that  t*  =  A.T(1  +  f),  where  r  is  the  gamma  function. 
Now,  we  come  to  our  result,  which  expresses  the  probability  that  a  portfolio  of 
m  ic3pars  will  require  no  more  than  t*  to  solve  the  problem. 

Theorem  5.  Let  p{m)  be  the  probability  that  Y  <  t* .  Then  p(m)  >  1  —  e-^ 
where  7  s=s  0.57721  is  the  Euler- Mas cheroni  constant. 

Proof.  We  know  that: 

p(m)  =  Fy(t*)  =  1  —  =  1  —  ( a(k))m ,  where  a{k)  = 

Next,  observe  that  a(k)  increases  monotonically  with  k  but  does  not  diverge 
as  k  — >  00.  For  example,  Fig.  11  in  Appendix  B  shows  a  plot  of  a(k).  Indeed, 
it  can  be  shown  that  (see  Lemma  2  in  Appendix  B):  lim^oo  a(k)  =  e~~^ .  In 
practice,  as  seen  in  Fig.  11  in  Appendix  B,  the  value  of  a(k)  converges  quite 
rapidly  to  this  limit  as  k  increases.  For  example,  a(5)  >  0.91  •  e~~^ ,  and  a(10)  > 
0.95  ■  e~~^ .  Since  Vfc  .  a(k)  <  e-^,  we  have  our  result: 

p(m)  >  1  —  {e~  7r)m  =  1  —  e_  eT 

Achieving  a  Target  Probability.  Now  suppose  we  want  pm  to  be  greater  than 
some  target  probability  p.  Then,  from  Theorem  5,  we  have: 

=►  Ml-P)  =  -£ 

For  example,  if  we  want  p  =  0.99999,  then  m  ss  20.  Thus,  a  portfolio  of  20 
ic3pars  has  about  0.99999  probability  of  solving  a  problem  at  least  as  quickly  as 
the  expected  time  in  which  a  single  IC3par  will  solve  it.  We  validated  the  efficacy 
of  Theorem  5  by  comparing  its  predictions  with  empirically  observed  results 
on  the  HWMCC14  benchmarks.  Overall,  we  find  the  observed  and  predicted 
probabilities  to  agree  significantly.  Further  details  are  presented  in  Section  5.2. 

Speeding  Up  the  Portfolio.  To  reduce  the  portfolio’s  runtime  below  t* ,  we  must 
increase  m  appropriately.  In  general,  for  any  constant  c  £  [0, 1],  the  probability 
that  a  portfolio  of  m  IC3par  solvers  will  have  a  runtime  <  c  ■  t*  is  given  by: 

p(m,  c,  k)  =  1  —  e-m(c'r(1+E^ 

For  c  <  1  we  do  not  have  a  closed  form  for  lim  p(m,  c,  k),  unlike  when  c  =  1. 

k, — yoo 

However,  the  value  of  p(m,  c,  k )  is  computable  for  fixed  m,  c  and  k.  Fig.  6(a)  plots 
p(m ,  c,  4)  for  m  =  {1, ... ,  100}  and  c  =  {0.4, 0.5, 0.6}.  Fig.  6(b)  plots  p(m ,  .5,  k) 
for  m  =  { 1, . . . ,  100}  and  k  =  {3,4,5}.  As  expected,  p(m,c,k)  increases  with: 
(i)  increasing  m;  (ii)  increasing  c;  and  (iii)  decreasing  k.  One  challenge  here  is 
that  we  do  not  know  how  to  estimate  k  for  a  problem  without  actually  solving 
it.  In  general,  a  smaller  value  of  k  means  that  a  smaller  portfolio  will  reach  the 
target  probability.  In  our  exeriments  -  recall  Fig.  5(a)  -  we  observed  fc-values  in 
a  small  range  (1  TO)  for  problems  from  HWMCC14.  These  numbers  can  serve 
as  guidelines,  and  could  be  refined  based  on  additional  experimentation. 
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Fig.  6.  (a)  p(m,  e,  4)  for  different  values  of  c;  (b)  p(m,  .5,  k )  for  different  values  of  k. 

5  Experimental  Results 

We  implemented  IC3SYNC,  IC3ASYNC  and  IC3PROOF  by  modifying  a  pub¬ 
licly  available  reference  implementation  of  IC3  (https://gith.ub.com/arbrad/ 
IC3ref ),  which  we  call  ic3ref.  All  propositional  queries  in  ic3  are  implemented 
by  calls  to  MINISAT.  We  refer  to  the  variant  of  IC3ref  that  uses  a  randomized 
minisat  (invoked  via  IC3  -r)  as  IC3RND.  We  use  ic3rnd  to  introduce  uncer¬ 
tainty  in  the  proof  search  by  IC3  purely  by  randomizing  the  backend  SAT  solver. 
We  performed  two  sets  of  experiments  -  one  to  evaluate  the  effectivess  of  the 
parallel  ic3  solvers,  and  another  to  validate  our  statistical  analysis  of  their  port¬ 
folios.  All  our  tools  and  results  are  available  at  http : //somewhere. 

Benchmarks.  We  constructed  four  benchmarks.  The  first  was  constructed  by 
taking  the  safe  examples  from  HWMCC14  (http://fmv.jku.at/hwmccl4cav), 
simplifying  them  with  the  IIMC  (http://ecee.colorado.edu/wpmu/iimc)  tool 
(via  iimc-hwmccl3  -t  pp),  and  selecting  the  ones  solved  by  IC3ref  within  900s 
on  a  8  core  3.4GHz  machine  with  8GB  of  RAM.  The  remaining  three  benchmarks 
were  constructed  similarly  from  the  buggy  examples  from  HWMCC14,  and  the 
safe  and  buggy  examples  from  the  TIP  benchmark  (http :  / /f mv .  j ku .  at/ aiger/ 
tip-aig-20061215 .  zip),  respectively.  We  refer  to  the  four  benchmarks  as  HWC- 
SAFE,  HWCBUG,  tipsafe,  and  tipbug,  respectively. 

SAT  Solver  Pool.  The  function  /  (cf.  Figs.  1-4)  is  implemented  by  a  SAT 
solver  (minisat).  A  separate  SAT  solver  Si  is  used  for  each  f(i).  Whenever 
f(i)  changes  due  to  the  addition  of  a  new  lemma  to  a  frame,  the  corresponding 
solver  Si  is  also  updated  by  asserting  the  lemma.  To  avoid  a  single  SAT  solver 
from  becoming  the  bottleneck  between  competing  threads,  we  use  a  “pool”  of 
minisat  solvers  to  implement  each  Si.  The  solvers  are  maintained  in  a  FIFO 
queue.  When  a  thread  requests  a  solver,  the  first  available  solver  is  given  to  it. 
When  a  lemma  is  added  to  the  pool,  it  is  added  to  all  available  solvers,  and 
recorded  as  “pending”  for  the  busy  ones.  When  a  busy  solver  is  returned  by  a 
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Fig.  7.  Speedup  of  ic3sync,  ic3async,  ic3proof  and  ic3rnd  compared  to  ic3ref. 

thread,  all  pending  lemmas  are  added,  and  the  solver  is  inserted  at  the  back  of 
the  queue.  We  refer  to  the  number  of  solvers  in  each  pool  as  SPSz. 


5.1  Comparing  Parallel  IC3  Variants 

These  experiments  were  carried  on  a  Intel  Xeon  machine  with  128  cores, 
each  running  at  2.67GHz,  and  1TB  of  RAM.  For  each  solver  S  selected  from 
{ic3async(4),ic3sync(4),ic3proof(4),ic3rnd}  and  each  benchmark  B ,  and 
with  SPSz  =  3,  we  performed  the  following  steps:  (i)  extract  all  problems  from 
B  that  are  solved  by  IC3ref  in  at  least  10s;  call  this  set  B*;  (ii)  solve  each  prob¬ 
lem  in  B*  with  ic3ref  and  also  with  a  portfolio  of  20  S  solvers,  compute  the 
ratio  of  the  two  runtimes;  this  is  the  speedup;  (iii)  compute  the  mean  and  max 
of  the  speedups  over  all  problems  in  B* .  Figure  7  shows  the  results  obtained. 
In  all  cases,  we  see  speedup.  On  this  particular  run,  ic3proof  performs  best 
overall,  with  an  average  speedup  of  over  6  and  a  maximum  speedup  of  over  300. 
Note  however,  that  performance  will  vary  across  runs  due  to  unpredictability 
of  runtime.  As  in  the  non-portfolio  case  (cf.  Fig.  5)  IC3PROOF  and  IC3ASYNC 
have  similar  performance,  and  are  better  than  IC3SYNC.  The  pattern  is  followed 
for  both  safe  and  buggy  examples.  Finally,  IC3RND  provides  mediocre  speedup 
across  all  examples  (cf.  the  “Max”  column)  indicating  that  parallelization  en¬ 
ables  broader  search  for  proofs  compared  to  randomizing  the  SAT  solver. 

5.2  Portfolio  Size 

To  validate  Theorem  5,  we  compared  its  predictions  to  empirically  observed 
results  as  follows  (again  using  SPSz  =  3): 

1.  Select  a  set  of  problems  -  same  as  in  Fig.  5(a)  -  from  HWMCC14,  and 
process  each  problem  as  follows. 

2.  Solve  the  problem  b  times  using  ic3par(4).  These  experiments  are  the  same 
as  the  ones  used  for  Fig.  5(a).  Hence  b  is  the  value  appearing  in  the  second 
column  of  Fig.  5(a).  This  gives  a  set  of  runtimes  t±, . . .  ,tb-  Fit  these  runtimes 
to  a  Weibull  distribution  to  obtain  the  estimated  value  of  k  (the  same  as  the 
third  column  of  Fig.  5(a)). 

3.  Compute  t  =  mean(ti, . . . ,  ff,).  This  is  the  estimated  average  time  for 
IC3par(4)  to  solve  the  problem. 
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Fig.  8.  Validating  Theorem  5;  (a)  mean  and  standard  deviation  of  ratios  of  predicted 
and  observed  probabilities;  (b)  scatter  plot  of  predicted  and  observed  probabilities. 


4.  Pick  a  portfolio  size  to.  Start  with  m  =  1. 

5.  Divide  t\,. into  blocks  of  size  to.  Let  B  =  [— J  •  We  now  have  B  blocks  of 

runtime  T), . . . ,  Tg,  each  consisting  of  to  elements.  Thus,  T±  =  {ti, . . . ,  tm}, 
T‘2  =  {tm+ 1, . . . ,  1 2m } ;  and  so  on.  For  i  =  1  compute  /q  =  min(Ti). 

Note  that  each  yq  represents  the  runtime  of  a  portfolio  of  to  IC3par(4)  solvers 
on  the  problem. 

6.  Let  n{m)  be  the  number  of  blocks  for  which  yq  <  f,  i.e. ,  n(m)  = 
|{*  £  [1,5]  |  Hi  <  t} |.  Compute  p*(m)  =  r^p-.  Note  that  p*(mn)  is  the  es¬ 
timate  of  p{rn)  based  on  our  experiments.  Compute  p(m)  =  1  —  ( a(k))m 
using  the  estimated  value  of  k  from  Step  2.  Compute  p(m)  =  pp .  We 
expect  p(m)  «  1. 

7.  Repeat  steps  5  and  6  with  to  =  2, . . . ,  100  to  obtain  the  sequence  p  = 
(p(l), . . . ,  p(100)).  Compute  the  mean  and  standard  deviation  of  p. 

Fig.  8(a)  shows  the  results  of  the  above  steps  over  all  the  selected  examples. 
We  see  that  for  each  example,  the  mean  of  p  is  very  close  to  1  and  its  standard 
deviation  is  very  close  to  0,  indicating  that  p(to)  and  p*(m)  agree  considerably. 
Furthermore,  Fig.  8(b)  shows  a  scatter  plot  of  all  p*  (to)  values  computed  against 
their  corresponding  p(m).  Note  that  most  values  are  very  close  to  the  (red)  x  =  y 
line,  as  expected. 


5.3  Parameter  Sweeping 

In  this  section  we  evaluate  the  performance  of  IC3PROOF  when  selecting  differnt 
combinations  of  ic3par  parameters.  We  observed  in  5.1  that  the  variants  of 
IC3PAR  each  have  a  chance  of  being  the  best  solver  for  differnt  benchmarks. 
From  the  previous  work  utilizing  the  portfolio  technique  (TODO  prune  this  list) 
[20,17,12,14,19,16],  we  see  that  using  a  suite  of  heterogeneous  solvers  would 
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Fig.  9.  IC3PROOF  speedup  on  three  benchmarks  compared  to  ic3ref.  The  intensity 
of  a  cell  indicates  the  corresponding  combination  of  ic3par  parameters  solves  the 
benchmark  faster. 


likely  be  successfull.  With  this  inspiration,  we  determined  that  running  portfolios 
of  ic3par  in  differnt  configurations  could  be  successfull.  This  effort  would  also 
help  to  better  characterize  the  behavior  of  ic3par. 

ic3par  has  two  parameters:  number  of  threads  running  a  copy  of  IC3, 
and  SPSz.  We  identify  an  instance  of  ic3par  run  with  these  parameters  as 
ic3PAR(i,  s)  where  i  is  the  number  of  IC3  threads  and  SPSz  =  s.  Thus, 
ic3proof(4,  3)  was  used  is  all  previous  experiments. 

Conjecture  2.  In  the  abscence  of  knowledge  of  optimal  parameter  values  for 
ic3PAR(i,  s),  a  heterogeneous  portfolio  using  random  feasible  parameter  values 
(/,  S)  will,  on  average,  yield  faster  IC3par  performance  than  a  homogenous  port¬ 
folio  with  constant  parameter  values  (i,s)  =  (4,3).  Where  I  and  S  are  defined 
by  a  random  discrete  variable  r.v.  X  =  x;  x  €  {1, 2, . . . ,  8}. 

To  investigate  Conjecture  2,  we  estimated  the  speedup  over  ic3  for  portfolios 
of  ic3proof(i,  s)  and  IC3proof(7,  S)  as  follows: 

1.  Select  a  benchmark  from  B  and  time  its  performance  with  IC3 

2.  Time  100  runs  if  IC3PROOF  in  each  of  the  64  possible  parameter  combina¬ 
tions  ic3proof(/,  S)  to  empiracly  charachterize  the  random  running  time 
distribution  across  the  parameter  space. 

3.  Select  randomly  100  portfolio  blocks  consisting  of  20  run  times 
of  IC3proof(«,  s)  from  the  6,400  recorded  running  times  convering 
IC3proof(/,  S)  performed  in  Step  2  Take  minimum  of  each  block  as  the 
portfolio  time,  and  average  the  100  minimums. 

4.  Select  the  5  portfolio  blocks  of  size  20  from  the  100  runs  of  ic3proof(4,  3) 
which  were  performed  as  part  of  Step  2.  Take  minimum  of  each  block  as  the 
portfolio  time,  and  average  the  5  minimums. 

For  this  investigation  we  constructed  a  portfolio  simulator  which  used  the 
running  times  gathered  from  up  to  6400  tests  per  benchmark  and  constructed 
ex  post  facto  portfolios  by  selecting  running  times  from  the  desired  parameter 
configuration.  We  utilized  over  11,000  hours  of  compute  time  across  11  dual  pro¬ 
cessor  machines  with  Intel(R)  Xeon(R)  2.40GHz  CPU’s  for  a  total  of  176  cores. 
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Fig.  10.  Speedup  via  parameter  sweeping. 


Summarizing  graphics  were  produced  for  visualization  of  performance  across 
the  parameter  space  (see  Fig.  9.)  The  visualizations  and  simulated  results  pre¬ 
sented  evidence  in  favor  of  2,  as  speedup  patterns  across  the  parameter  space 
were  varied  for  all  of  the  selected  benchmark  examples  and  every  simulated 
ic3proof(I,  S)  portfolio  ran  faster  than  simulated  ic3proof(4,  3)  portfolios. 

To  attempt  to  validate  the  conjecture,  actual  heterogenous  ic3par(I,  S)  port¬ 
folios  were  run  on  the  10  selected  benchmarks  from  Section  4.  Each  portfolio  was 
run  at  least  XXX  times,  and  the  average  portfolio  times  were  then  compared. 
These  results  (shown  in  Figure  10)  show  that  averaged  across  these  10  examples, 
ic3par(/,  S)  is  as  fast  as  any  single  IC3par  variant.  The  limited  ammount  of 
data  collected  to  validate  this  conjecture  does  not  support  any  strong  claims,  but 
from  what  we  have  observed:  using  heterogenous  portfolios  of  IC3PROOF  gives 
the  same  speedup  as  picking  the  best  possible  IC3PAR  variant.  The  advantage  to 
this  technique  is  for  a  new  problem  when  the  strongest  performing  variant  can 
not  be  known  ahead  of  time. 

6  Conclusion 

We  present  three  ways  to  parallelize  IC3.  Each  variant  uses  a  number  of  threads 
to  speed  up  the  computation  of  an  inductive  invariant  or  a  CEX,  sharing  lemmas 
to  minimize  duplicated  effort.  They  differ  in  the  degree  of  synchronization  and 
technique  to  detect  if  an  inductive  invariant  has  been  found.  The  runtime  of 
these  solvers  is  unpredictable,  and  varies  with  thread-interleaving.  We  explore 
the  use  of  portfolios  to  counteract  the  runtime  variance.  Each  solver  in  the 
portfolio  potentially  searches  for  a  different  proof/CEX.  The  first  one  to  succeed 
provides  the  solution.  Using  the  Extreme  Value  theorem  and  statistical  analysis, 
we  construct  a  formula  that  gives  us  a  portfolio  size  to  solving  a  problem  within 
a  target  time  bound  with  a  certain  probability.  Experiments  on  HWMCC14 
benchmarks  show  that  the  combination  of  parallelization  and  portfolios  yields 
an  average  speedups  of  6x  over  IC3,  and  in  some  cases  speedups  of  over  300.  An 
important  area  of  future  work  is  the  effectiveness  of  parallelization  and  portfolios 
in  the  context  of  software  verification  via  a  generalization  of  IC3  [10]. 
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A  Proof  of  Correctness  of  IC3  and  its  Parallel  Versions 


We  begin  with  a  useful  Lemma. 

Lemma  1.  Suppose  there  exists  an  index  i  such  that  the  following  hold: 

on  '■  I  =>  f(i)  a2:f{i)AT  =>  f(i  +  1) 
a3:f(i)AT  =>  S'  a4  :  f(i)  =  f(i  +  1)  a5  :  I  =>  S 

Then  Post0+(I)  C  S. 

Post(I)  C  Post(f(i)) 

Post(I)  C  f(i  +  1) 

Post(I)  C  /(*) 

From  this,  applying  Post(-)  again,  we  get: 

Or  A  a2  A  ol\  =>  Post2  (I)  C  Post(f(i)) 

Post2{I)  C  /(i  +  1) 

=►  Pos<2(/)  c  /(*) 

Since,  we  can  continue  arbitrarily  many  times  like  this,  we  have: 

ai  A  «2  A  «4  =>  Post1+(I)  C  Post(f(i)) 

From  the  above  and  0:3,  we  have  Post1+(I)  C  S.  Also,  from  a3,  we  know  that 
Post°(I)  C  51.  Hence,  Post0+(I)  C  S',  which  is  what  we  want.  □ 

We  now  prove  the  theorems  in  Section  3. 

Theorem  1.  If  IC3()  returns  T,  then  the  problem  is  safe.  If  IC3()  returns  _L, 
then  the  problem  is  unsafe. 

Proof.  If  IC3()  returns  _L,  strengthen!)  sets  bug  to  T.  Hence,  there  exists  a 
sequence  ((mo,  0),  (mi,  1), . . . ,  (m.K-2,  K  —  2))  such  that: 

mo  \=  I  A  m.K-2  |=  “'-S'  A  Vi  €  [0,  K  —  2) .  m,  A  T  A  m'+1  ^  _L  (1) 

This  sequence  leads  to  a  counterexample.  The  problem  is  unsafe.  If  IC3()  returns 
T,  from  lines  27-28  we  have: 

3 i  G  [1,  K  -  2]  .  F [i]  =0  =>  a*  e  [1,  K  -  2] .  /(*)  =  f(i  +  1) 


Proof.  Since  Post(-)  is  monotonic: 

a\  A  Oi2  A  0:4  = 


This,  together  with  I3,  the  check  for  base  cases  and  Lemma  1  implies  that 
Post0+(I)  C  S.  The  problem  is  safe.  □ 


Theorem  2.  If  IC3Sync()  returns  T,  then  the  problem  is  safe.  7/IC3Sync() 
returns  _L,  then  the  problem  is  unsafe. 

Proof.  If  IC3Sync()  returns  _L,  then  some  call  to  stregthen(F;,  K)  returns 
with  bug  =  T.  As  in  the  case  of  ic3,  this  implies  that  there  exists  a  sequence 
((mo,  0),  (mi,  1), . . . ,  (tok-2,  K  —  2))  such  that: 

mo  |=  I  A  mK-2  |=  ~<S  A  Vz  G  [0,  K  —  2) .  m,  A  T  A  m'i+1  ^  _L  (2) 

This  sequence  leads  to  a  counterexample.  The  problem  is  unsafe.  If  IC3Sync() 
returns  T,  then  from  lines  85-86,  we  have  3j  G  [1,K  —  2]  .  Vz  G  [1,  n]  .  F,;[j]  = 
0  =$■  3 j  G  [1,K  —  2]  .  /(/)  —  f(j  +  1).  This,  together  with  I6,  the  check  for 
base  cases  and  Lemma  1  implies  that  Post0+(I)  C  S.  The  problem  is  safe.  □ 

Theorem  3.  If  IC3Async()  returns  T,  then  the  problem  is  safe.  //IC3Async() 
returns  _L,  then  the  problem  is  unsafe. 

Proof.  If  IC3Async()  returns  _L,  then  some  call  to  stregthenO  returns  with 
bug  =  T.  As  in  the  case  of  IC3,  this  implies  that  there  exists  a  sequence 
((mo,  0),  (mi,  1),  •  •  • ,  (ffiK-2,  K  —  2))  such  that: 

mo  |=  I  A  mK-2  |=  ~<S  A  Vz  G  [0,  K  —  2) .  m*  A  T  A  m'i+1  ^  _L  (3) 

This  sequence  leads  to  a  counterexample.  The  problem  is  unsafe.  If  IC3Async() 
returns  T,  then  from  lines  112-113,  we  have  3j  G  [1,  K  —  2] .  Vz  G  [1,  n]  .  F,  [j]  = 
0  =>  3j  G  [1,K  —  2]  .  f(j)  =  f(j  +  1).  This,  together  with  Ig,  the  check  for 
base  cases  and  Lemma  1  implies  that  Post0+(I)  C  S.  The  problem  is  safe.  □ 

Theorem  4.  If  IC3Proof  ()  returns  T,  then  the  problem  is  safe.  If  IC3Proof  () 
returns  _L,  then  the  problem  is  unsafe. 

Proof.  If  IC3Proof()  returns  _L,  then  some  call  to  stregthenO  returns  with 
bug  =  T.  As  in  the  case  of  IC3,  this  implies  that  there  exists  a  sequence 
((mo,  0),  (mi,  1),  •  •  • ,  (ffiK-2,  K  —  2))  such  that: 

mo  f=  /  A  mK-2  |=  -'S  A  Vz  G  [0,  K  —  2) .  m*  A  T  A  m'i+1  ^  _L  (4) 

This  sequence  leads  to  a  counterexample.  The  problem  is  unsafe.  If  IC3Async() 
returns  T,  then  some  call  to  propProof  O  returns  with  safe  =  T.  Then,  from 
the  check  at  lines  163,  and  the  fact  that  C\  A  O4  holds  at  line  163,  we  know  that 
II  is  an  inductive  invariant  that  implies  S.  The  problem  is  safe.  □ 

Theorem  5.  All  three  parallel  variants  of  IC3  terminate  on  all  inputs. 

Proof.  Recall  the  function  /  from  Figs.  2-4.  For  any  index  z,  let  |/(z)|  denote 
the  number  of  satisfying  solutions  of  /(z).  Let  us  write  K*  to  mean  K  in  the 
case  of  ic3sync,  and  max(Ki, . . . , Kn)  in  the  case  of  IC3ASYNC  and  ic3proof. 
It  can  be  shown  that  the  following  is  an  invariant  of  all  IC3PARS. 

|/(0) |  =  \I\  A  Vj  G  [1,  K*  -  1] .  f(j  -  1)  <  f(j) 


In  other  words,  /( 0)  has  exactly  the  same  number  of  solutions  as  the  initial 
states,  and  the  number  of  solutions  of  f(j)  grows  monotonically  with  j.  Suppose 
an  execution  of  IC3par  does  not  terminate.  Then  we  must  reach  a  point  where 
K*  >  2l'  I  and  Vj  £  [1,K*  —  1]  .  3i  £  [l,n]  .  Fj[j]  ^  0.  But  this  means  that 
Vj  £  [1,  K*  —  1]  .  f(j  —  1)  <  f(j).  Since  |/|  >0  (otherwise  the  algorithm 
terminates  with  the  check  for  base  cases),  we  have  |/(K*  —  1)|  >  2lyl.  This  is 
absurd  since  there  cannot  be  more  than  I  solutions  to  any  formula  over  V.  □ 

B  Statistical  Analysis  of  ic3par  Portfolios 


Fig.  11.  Plot  of  a(k)  against  k. 


Lemma  2.  Let  a(k)  =  e  (r(1+l')')k .  Then  lim  a(k)  =  e  A. 

k — yoo 

Proof.  It  suffices  to  show  that: 

lim  (.T(l  +  j))k  =  e-7 

k — yoo  fx, 

or,  equivalently: 

lim  k  ■  ln(T(l  +  y))  =  —7 

k—>  00  k 

Using  the  result  2: 

ln(P(!  +  z))  =  ~T  '  z  +  'Yh  ~~  ‘  (~z)n >  if  kl  <  1 

n—2 

2  This  result  is  mentioned  at  https://en.wikipedia.org/wiki/Gamma_function. 
It  can  be  derived  from  another  result  (equation  20  on  page  621)  in  the  fol¬ 
lowing  paper  -  Wrench,  J.  W.  Jr.  ’’Concerning  Two  Series  for  the  Gamma 
Function.”  Math.  Comput.  22,  617-626,  1968.  The  paper  is  available  at  http: 
//www . ams . org/journals/mcom/1968-22- 103/S0025-5718- 1968-0237078-4/ 
S0025-5718-1968-0237078-4.pdf. 


where  £(m)  is  the  Riemann  zeta  function,  we  get: 


lim  k  •  ln(T(l  +  ^))  =  lim  k  ■  (-7  •  \  +  -  •  (-7)11) 

k '  k—>oo  h  n  k 


k  '  n 

n—2 


C(n)  /  1  ,r 

=  -7+  lim  kJ2 - -  •  (-t) 

k—>oo  L — '  Tl  k 

n—2 


=  -7+  lim 

k—too  ‘  ^  Tl 
n= 2 


Since  lim  =  0  for  n  >  2,  we  immediately  get  our  result: 


lim  fc  •  ln(J^(l  +  — ))  =  —7 

k—>oo  k 


